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AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 

1 . (Currently Amended) A method of performing speaker verificationjo 
determine whether a speaker is a registered speaker , the method comprising: 

a) obtaining an array a plura li ty of frames of compressed audio formants 
representing the speaker uttering a predetermined pass phrase, each frame within 
the array including: 

i) energy data and pitch data characterizing the residue of the 
speaker uttering the predetermined pass phrase; and 

ii) a plurality of formant coefficients characterizing the resonance 
of the speaker uttering the predetermined pass phrase; and 

b) performing a time domain normalization of the array of frames of 
compressed audio formants to a sample array of frames of compressed audio 
formants such that such that the two arrays are of an equal Quantity of frames: 

c) determining whether the speaker is the registered speaker v e r i fy i ng 
the i dent i ty of the spoakor by; 

generating an array of discrepancy values, each discrepancy value 
representing the difference between matching at l oast one of: i) an energy data 
value: ii) a_pitch data value : and iii) ajormant coofficients coefficient value of a 
frame of the array and a corresponding energy value; ii) pitch value: and iii) formant 
coefficient value of a corresponding frame in the sample array; and 

determining whether the array of discrepancy values is within a 
predetermined threshold, in tho frames to at l oast on e of e n e rgy, pitch, and 
formant co e ff i c ie nts of a plurality of s amp l o framos stor e d i n m e mory. 

2. (Currently Amended) The method of performing speaker verification of claim 
22, 4 t wherein the step of obtaining an array of frames of compressed audio 
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formants includes receiving the frames of compressed audio formants from a 
remote Internet telephony device. 

3. (Currently Amended) The method of performing speaker verification of claim 
2, wherein the step of obtaining an array of frames of compressed audio formants 
from the remote Internet telephony device comprises receiving audio input of the 
speaker uttering the pass phrase from a microphone, and digitizing the audio input, 
converting the digitized audio input to a sequence of frames of compressed audio 
formants^ , furth e r compressing th e s e qu e nce of frames of comprossod aud i o 
formants to g e n e rat e comprossod aud i o data packets, and send i ng tho 
comprossod audio data packets from th e remot e I nt e rn e t t e l e phony d e v i ce. 

4. (Canceled) 

5. (Currently Amended) A method of determining whether a speaker is a 
registered speaker, the method comprising: 

a) obtaining compressed audio formants for each frame of an array of 
frames representing the speaker uttering a predetermined pass phrase:_74ke 
compr e ssed aud i o formants i nc l ud i ng: ^ 

i) energy data and pitch data charactor i z i ng the res i due of tho 

sp e ak e r utt e ring the predetermin e d pass phrase; 

W) formant co e ffic ie nts charactor i z i ng tho rosonanco of tho 

sp e ak e r utt e r i ng th e pr e d e t e rm i n e d pass phras e ; 

b) performing a time domain normalization of the array to a sample 
array of frames stored in a memory and representing the registered speaker 
uttering the predetermined pass phrase to decimate a portion of the frames of the 
larger of the two arrays such that the two arrays, after decimation, are of an eoual 
guantity of frames, the portion of the frames to be decimated being selected by: 

selecting a plurality of audio formant decimation groups, each 
audio formant decimation group being a selection of frames from the larger of the 
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two arrays which, if decimated, yields the best alignment between a formant 
coefficient value of each frame of each the array and the corresponding formant 
coefficient value of each frame of the sample array; and 

determining a decimation group of frames from the larger of 

the two arrays, the decimation group being a guantitv of frames egual to the 
quantity of frames to be decimated and being the frames which are selected by 
weighted average from each of the audio formant decimation groups; 

c) generating an array of discrepancy values, each discrepancy value 
representing the difference between one of an audio formant value of a frame of 
the array and a corresponding audio formant value of a corresponding frame of the 
sample array; and 

d) determining that the remote speaker is the registered speaker if the 
array of discrepancy values is within a predetermined threshold, determ i n i ng 
whether th e sp e ak e r is the r e gist e r e d sp e aker by matching at le ast one of e n e rgy, 
p i tch, and formant co e ff i ci e nts from tho compr e ss e d aud i o formants to 
pr e d e t e rmin e d comb i nations of at le ast one of e n e rgy, p i tch, and formant 
coeff i c ie nts of samp l e compr e ssed aud i o formants known to repr e s e nt th e 
rog i stored sp e ak e r, 

6. (Currently Amended) The method of determining whether a speaker is a 
registered speaker of claim 23, § t wherein the step of obtaining compressed audio 
formants includes obtaining the compressed audio formants from a remote location 
and sending the compressed audio formants from the remote location. 

7. (Original) The method of determining whether a speaker is a registered 
speaker of claim 6, wherein the step of obtaining compressed audio formants at a 
remote location includes receiving audio input of the speaker uttering the pass 
phrase from a microphone, digitizing the audio input, and compressing the digitized 
audio input to generate compressed audio formants. 
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Claims 8-9 (Cancelled) 

1 0. (Currently Amended) A speaker verification server for determining whether 
verify i ng tho identity of a remote speake r is a registered speaker , the server 
comprising: 

a) a network interface for receiving , via a packet switched network, 
compressed audio formants for each frame of an array of frames v i a a packot 
sw i tched network representing the a-remote speaker uttering a predetermined pass 
phrase as audio input to a remote telephony client; 

b) a database storing compressed audio formants for each frame of a 
sample array of frames a p l ural i ty of oomprossod aud i o formant samples, each 
representing the a-registered speaker uttering a reg i st e r e d the predetermined pass 
phrase as audio input; and 

c) a verification application operatively coupled to each of the network 
interface and the database for comparing the compressed audio formants of the 
array of frames to the compressed audio formants of the sample array of frames 
roprosont i ng tho romoto sp e aker to a compressed aud i o formant sampl e to 
determine whether the remote speaker is the registered speaker by: 

performing a time domain normalization of the array to the 
sample array such that such that the two arrays are of an eoual guantitv of frames; 

generating an array of discrepancy values, each discrepancy 
value representing the difference between one of an audio formant value of a 
frame of the array and a corresponding audio formant value of a corresponding 
frame of the sample array; and 

determining that the remote speaker is the registered speaker 
if the array of discrepancy values is within a predetermined threshold. 

1 1 . (Currently Amended) The speaker verification server of claim 24^4^ 
wherein the compressed audio formants include energy data and pitch data 
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characterizing the residue of the speaker uttering the predetermined pass phrase 
and formant coefficients characterizing the resonance of the speaker uttering the 
predetermined pass phrase; and each frame comprossod aud i o formant samp l e 
includes an energy value and ajDitch value data-characterizing the residue of the 
registered speaker uttering the registered pass phrase and formant coefficient 
values co e ff i c ie nts characterizing the resonance of the registered speaker uttering 
the registered pass phrase. 

12. (Currently Amended) The speaker verification server of claim 1 1 , wherein 
the verification application determines the decimation group of frames by: 

selecting a pitch decimation group of frames from the larger of the two 

arrays, the pitch decimation group being the selection of frames which, if 
decimated, yields the best alignment between the pitch values of the two arrays 
after decimation; 

selecting an energy decimation group of frames from the larger of the two 

arrays, the energy decimation group being the selection of frames which, if 
decimated, yields the best alignment between the energy values of the two arrays 
after decimation; 

selecting a plurality of formant coefficient decimation groups, each formant 

coefficient decimation group being a selection of frames from the larger of the two 
arrays which, if decimated, yields the best alignment between the formant 
coefficient values of the two after decimation; and 

selecting frames from the larger of the two arrays for the decimation group 
by weighted average from the pitch decimation group, the energy decimation 
group, and each formant coefficient decimation group, d e t e rm i n e s whothor the at 
le ast on e of e n e rgy, p i tch, and formant coeff i c i ents from the compr e ss e d aud i o 
formants i s s i m il ar to the at l e ast one of the e n e rgy, p i tch, and formant coeff i c i ents 
of tho samp l o. 

Claims 13-21 (Canceled) 
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22. (New Claim) The method of performing speaker verification of claim 1 , 
wherein performing a time domain normalization comprises: 

comparing the quantity of frames in the array with the quantity of frames in 
the sample array to determine the quantity of frames to be decimated from the 
larger of the two arrays such that the two arrays are of an equal quantity of frames; 

selecting a pitch decimation group of frames from the larger of the two 
arrays, the pitch decimation group being the selection of frames which, if 
decimated, yields the best alignment between the pitch values of the two arrays 
after decimation; 

selecting an energy decimation group of frames from the larger of the two 
arrays, the energy decimation group being the selection of frames which, if 
decimated, yields the best alignment between the energy values of the two arrays 
after decimation; 

selecting a plurality of formant coefficient decimation groups, each formant 
coefficient decimation group being a selection of frames from the larger of the two 
arrays which, if decimated, yields the best alignment between the formant 
coefficient values of the two arrays after decimation; and 

determining a decimation group of frames from the larger of the two arrays, 
the decimation group being a quantity of frames equal to the quantity of frames to 
be decimated and being the frames which are selected by weighted average from 
the pitch decimation group, the energy decimation group, and each formant 
coefficient decimation group; and 

decimating the decimation group of frames from the larger of the two arrays. 

23. (New Claim) The method of determining whether a speaker is a registered 
speaker of claim 5, wherein 

determining the decimation group of frames comprises: 
selecting a pitch decimation group of frames from the larger of the two 
arrays, the pitch decimation group being the selection of frames which, if 
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decimated, yields the best alignment between the pitch values of the two arrays 
after decimation; 

selecting an energy decimation group of frames from the larger of the two 
arrays, the energy decimation group being the selection of frames which, if 
decimated, yields the best alignment between the energy values of the two arrays 
after decimation; 

selecting a plurality of formant coefficient decimation groups, each formant 
coefficient decimation group being a selection of frames from the larger of the two 
arrays which, if decimated, yields the best alignment between the formant 
coefficient values of the two after decimation; and 

selecting frames from the larger of the two arrays for the decimation group 
by weighted average from the pitch decimation group, the energy decimation 
group, and each formant coefficient decimation group. 

24. (New Claim) The speaker verification server of claim 1 0, wherein the 
verification application performs time domain normalization by: 

comparing the quantity of frames in the array with the quantity of frames in 
the sample array to determine the quantity of frames to be decimated from the 
larger of the two arrays such that the two arrays are of an equal quantity of frames; 

selecting a plurality of audio formant decimation groups, each audio formant 
decimation group being a selection of frames from the larger of the two arrays 
which, if decimated, yields the best alignment between a formant coefficient value 
of each frame of each the array and the corresponding formant coefficient value of 
each frame of the sample array after decimation; and 

determining a decimation group of frames from the larger of the two arrays, 
the decimation group being a quantity of frames equal to the quantity of frames to 
be decimated and being the frames which are selected by weighted average from 
each of the audio formant decimation groups; and 

decimating the decimation group of frames from the larger of the two arrays. 
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